Categories

Versions

Get Page (Web Mining)

Synopsis

Gets a page via HTTP.

Description

This operator sends a GET request via HTTP. The returned page is output as a document.

Output

  • output

    The output port.

Parameters

  • urlThe URL from which should be read. Range:
  • random_user_agentChoose a user agent randomly from a set of 7000 user agents Range:
  • user_agentThe user agent property. Range:
  • connection_timeoutThe timeout (in ms) for the connection. Range:
  • read_timeoutThe timeout (in ms) for reading from the URL. Range:
  • follow_redirectsSpecifies, whether redirects should be followed. Range:
  • accept_cookiesSpecifies, whether cookies should be accepted. Range:
  • cookie_scopeSpecifies the scope of the cookies used Range:
  • request_methodSpecifies the request method. Range:
  • query_parametersThe query parameters as key/value pairs. Range:
  • request_propertiesWith this parameter you can define all properties that are sent with the HTTP request to match the needs of your webservice. Range:
  • override_encodingNormally, the encoding of the retrieved page is determined automatically. In some rare cases this does not work well or the server provides a wrong encoding string. In this case, you can enable this option to override the automatically detected encoding. Range:
  • encodingThe encoding used for reading or writing files. Range:
  • keep_sensitive_headersKeep "Authorization" and "Cookie" header during a redirect to a different domain or subdomain. Range: